Spatial tendency

The aim of spatial tendency analysis is to assess whether expression features were influenced by spatial proximity to the region of interest. Expression features could be gene expression, pathway activity, cell proportion and so on. The region of interest could be defined by manual annotation or be automatically detected by the “Spatial Domain” function. Two kinds of methods, statistical test and regression model, were available for tendency estimation in the “Spatial Tendency” module.

Here we used 10X Visium data of mouse dorsolateral prefrontal cortex (DLPFC, 151676) as an example to validate the feasibility of spatial tendency estimation.

The raw data are available at https://github.com/LieberInstitute/spatialLIBD. The processed data 1516761_10X_adata.h5ad is stored in https://zenodo.org/records/14588408.

Read and preprocessing

[1]:
import warnings
warnings.filterwarnings("ignore")
import scanpy as sc
import pandas as pd
import SOAPy_st as sp
import matplotlib.pyplot as plt
[2]:
# input adata
adata = sc.read_h5ad('/csb2/project/SpatialPackage_whq/Tutorial/data/1516761_10X_adata.h5ad')

# delete genes with same names
adata.var_names_make_unique()

# preprocessing
adata.var["mt"] = adata.var_names.str.startswith("MT-")
sc.pp.calculate_qc_metrics(adata, qc_vars=["mt"], inplace=True)
sc.pp.filter_genes(adata, min_counts=10)
sc.pp.filter_genes(adata, min_cells=2)
sc.pp.normalize_total(adata, inplace=True)
sc.pp.log1p(adata)
[3]:
sc.pl.spatial(adata, img_key="hires", color='cluster')
../_images/Tutorials_Spatial_tendency_5_0.png

Generate a mask image from the domain cluster

To find genes whose spatial expression changes with the distance to white matter, white matter regions (ROI) are selected to construct a mask image used sp.tl.get_mask_from_domain(). KSize is the convolution kernel for the dilation and erosion operations of the image, which must be the odd number.

[4]:
mask = sp.tl.get_mask_from_domain(adata, clusters='WM', KSize=35, cluster_key='cluster')
plt.imshow(mask, cmap='gray')
plt.show()
[11361 11260]
WARNING:root:adata has not been initialized, and adata.uns['SOAPy'] has been established
../_images/Tutorials_Spatial_tendency_8_2.png

Statstical testing

Cortical layers were divided into two regions before and use Wilcoxon test.

[5]:
wilcoxon_res = sp.tl.wilcoxon_test(
    adata,
    mask,
    radius=1000,
    location='out',
    cut=500
)
[6]:
wilcoxon_res.head()
[6]:
gene stat P value effective spot near effective spot far P fdr P rej
0 AL669831.5 1096392.0 0.003146 1234 1750 0.041365 True
1 LINC00115 1081393.0 0.397687 1234 1750 0.698975 False
2 FAM41C 1082177.0 0.358737 1234 1750 0.668305 False
3 SAMD11 1076975.0 0.384899 1234 1750 0.689967 False
4 NOC2L 1099105.0 0.185846 1234 1750 0.491811 False

Divide the surrounding areas of ROIs into continuous zones parallel to the shape of the boundary lines, and then calculate the Spearman correlation coefficients between the expression median of different zones and their distance ranks to the boundaries.

[7]:
spearman_res = sp.tl.spearman_correlation(
    adata,
    mask,
    radius=1000,
    num=5
)
[8]:
spearman_res.head()
[8]:
gene coef P value P fdr P rej
0 AL669831.5 0.032850 0.053347 0.125194 False
1 LINC00115 -0.011713 0.490979 0.639046 False
2 FAM41C 0.016004 0.346641 0.504274 False
3 SAMD11 -0.008111 0.633396 0.756434 False
4 NOC2L 0.046243 0.006516 0.023516 True

Regression

To resolve more complex spatial tendency (e.g., nonlinear) or analyze ROIs without prior hypothesis, SOAPy provides a parameter regression method (polynomial regression) and a non-parametric regression method (LOESS regression).

Here we use the method of polynomial regression as an example.

[9]:
sp.tl.spatial_tendency(
    adata,
    mask, #
    radius=1000,
    location='out',
    frac=5
)
100%|█████████████████████████████████████████████████████████████████████████████████| 16120/16120 [00:57<00:00, 282.07it/s]
[9]:
AnnData object with n_obs × n_vars = 3460 × 16120
    obs: 'in_tissue', 'array_row', 'array_col', 'cluster', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'imagerow', 'imagecol'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_counts', 'n_cells'
    uns: 'spatial', 'log1p', 'cluster_colors', 'SOAPy'
    obsm: 'spatial'

The regression method describes the continuous spatial variation of expression, therefore it could find more complex spatial patterns than other methods, such as nonlinear change “low-high-low”. PCP4 is an example.

Using show_tendency() to display the spatial tendency of genes.

[10]:
sp.pl.show_tendency(adata, gene_name = 'PCP4', show=True)
../_images/Tutorials_Spatial_tendency_20_0.png

The real spatial distribution of PCP4 is used for verification, and the results are consistent with the trend we find out.

[11]:
sc.pl.spatial(adata, img_key="hires", color='PCP4')
../_images/Tutorials_Spatial_tendency_22_0.png

Clustering genes based on regression curves

According to the similarity of the curves, the genes were clustered into 10 clusters, and the genes with similar expression distribution were summarized. sp.tl.gene_cluster() can cluster the curves of genes to obtain genes with similar spatial trends and sp.pl.show_curves_cluster() shows the spatial trends of the clusters

[12]:
sp.tl.gene_cluster(adata=adata, k=10, range_min=0.03, fdr=True, pvalue=0.05)
sp.pl.show_curves_cluster(adata)
../_images/Tutorials_Spatial_tendency_25_0.png

Display of regression results

[13]:
adata.uns['SOAPy']['gene_cluster']['gene_cluster'].head()
[13]:
p_value param range correlation Spots number AIC BIC p_adjust p_rej cluster
HES4 5.633775e-04 [0.13264303892839413, -0.0006650334489480428, ... 0.101848 0.073997 2984 942.057636 978.063756 4.932996e-03 True 8
ISG15 1.319268e-14 [0.22827756017707687, 0.0005260869480460644, -... 0.300578 -0.141283 2984 2178.311971 2214.318090 6.041649e-13 True 1
AL390719.2 1.297120e-09 [0.0019080052657183982, 0.00029699148644908916... 0.106268 0.016078 2984 -8162.861969 -8126.855849 3.313721e-08 True 2
VWA1 9.577923e-13 [0.2811533391077294, -0.0018370393852305462, 6... 0.197323 -0.090514 2984 798.482892 834.489012 3.582277e-11 True 5
FNDC10 2.519375e-03 [0.014946894836472786, 0.0013003833923470607, ... 0.185635 0.023977 2984 84.695331 120.701451 1.725981e-02 True 4